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Abstract S 

choenberg transformations, mapping Euclidean configurations into Eu- 
clidean configurations, define in turn a transformed inertia, whose minimiza- 
tion produces robust location estimates. The procedure only depends upon 
Euclidean distances between observations, and applies equivalently to uni- 
variate and multivariate data. The choice of the family of transformations 
and their parameters defines a flexible location strategy, generalizing M- 
estimators. Two regimes of solutions are identified. Theoretical results on 
their existence and stability are provided, and illustrated on two data sets. 

Keywords: Correspondence Analysis, Euclidean distances, Hubcr func- 
tion, Huygens principles, M-estimators, Schoenberg transformations, Tukey 
bisquare 



1 Introduction and main result 

This paper investigates the properties of presumably new location estimates, 

defined through Schoenberg transformations, which map initial Euclidean 
distances into new Euclidean distances (Schoenberg 1938; Bavaud 2011). 

Specifically, consider an univariate sample {a;j}"^i with weights {/i}iLi. 
The weighted mean a = Xf := '^^fiXi minimizes Yli fii^'i ~ the 
weighted median a = Xq.s minimizes ^ifi\xi — a\. This paper introduces 
and studies a family of centroids a, suitable for any dimension, generalizing 
the previous well-known basic estimates. The centroids are defined as the 
averages a = ctiXi minimizing the transformed inertia 
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r(a)=5]/,^(Aa) (1) 

i=l 

where Dia — ||a;i — a|p is a squared Euclidean distance and ifiiD) a Schoenberg 



transformation (see section 2.2 1. 



The profile a generating minimizers a = aiXi turns out to satisfy 
" Y-'^' f l?n ^ " 51 "J-^^J ~ ^ H ^J^kDjk ■ (2) 

This pair of identities defines an iterative scheme, depending on the distances 
between observations only, and converging in general towards a local mini- 
mum of ([T]). The term ip'(Dia) downweights observations distant from the 
centroid a, which behaves as a multidimensional robust estimate of location, 
comparable to the M-estimates in the one-dimensional case. 

Section [2] defines the main ingredients and presents general results, in 
particular the existence of two regimes of solutions. Section [3] shows the con- 
nection with the theory of A/-estimates in one dimension, illustrated by the 
copper concentration data in Section |4] Section [5] illustrates the multidi- 
mensional case by means of the chi-square distances occurring in correspon- 
dence analysis. 



2 Definitions and general results 

Data are characterized by the matrix of dissimilarities Dij between obser- 
vations i,j = 1, . . . ,n, together with their weights > with = 1. 
We assume the dissimilarities to be squared Euclidean, that is of the form 
Dij = \\xi — XjW^ where the coordinates Xi € M^, unique up to a transla- 
tion and a rotation, can be recovered by MDS (see e.g. Mardia et al. 1979; 
Borg and Groenen 1997; Bavaud 2011; and references therein), in a space of 
dimension p < n — 1. 

Here and in the sequel, we assume that observations are distinct, that is 
Dij > for i ^ j. That is, possibly identical initial observations Xi = Xj 
should be first aggregated into a single observation of weight /,; + fj . 



2.1 Huygens principles 



The inertia ^(/), measuring the dispersion of the weighted configuration, 
expresses as in either equivalent two forms (Huygens weak principle) 
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where Dif — \\xi — x fW^ and Xf — J^i fi^i- Also (Huygens strong principle) 

=A/ + ^(/) . (4) 

j 

Consider another distribution or profile a with '^^(Xi = 1, with associated 
centroid a — aiXi. Substituting a to / in Q yields the second identity in 

Also, Y.^ f^D^a = Dfa + A{f ), which showS r{a) iu ([l]) to be minimum 

for a = Xf in the identity case ip{D) ~ D - an elementary result. 



2.2 Schoenberg transformations 

A Schoenberg transformation is a componentwise mapping Dij = 4>(Dij) with 
the property that if Dij represents a squared Euclidean distance between 
observations i and j, so does Dij (irrespectively of the dimension p). 

The class of all Schoenberg transformations has been investigated and 
determined by Schoenberg (1938, Theorem 6 p. 828; Bavaud 2011): 

Theorem 1 (Schoenberg 1938). The function 4){D) is a Schoenberg trans- 
formation iff of the form 

where g{X)dX is a non-negative measure on [0, oo) such that ^^^dX < oo. 

Equivalently, ^p{D) is a Schoenberg transformation iff it is smooth, with 
V?(0) = 0, ^(2r-i)(£,) > and Lp^^''\D) < for all r =1,2,... 

The second part is a consequence of Bernstein theorem on completely mono- 
tonic functions. In particular, a Schoenberg transformation is increasing, con- 
cave, and zero at the origin. Examples are provided by (Bavaud 2011): 

(j>{D) = D"^ (0 < q < 1) power transformation 

(j>{D) = 1 — exp{— D / S)) ((5 > 0) exponential transformation (6) 
(j>{D) = ln(l -f D/6) {6 > 0) logarithmic transformation. 

A transformation 'fi(D) is said to be rectifiable if ip'{0) < oo, that is iff 
Jo° dW '^'^ < By ibn, rectifiable transformations obtain as mixtures of 
(1 — exp(— A£')/A, which tends to the identity _D = for A 0. 
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A transformation f(D) is said to be bounded if ^p{oo) < oo, that is iff 
/o°° dX < oo. By JsL bounded transformations obtain as mixtures of 
1 — exp(— AI?), which tends for A — ^ oo to the discrete metric D = I{D > 0), 
attributing a unit dissimilarity between distinct observations. 

The power transformation is not rectifiable nor bounded; the exponen- 
tial transform is rectifiable and bounded; the logarithmic transformation is 
rectifiable but not bounded. 



2.3 High- dimensional embedding and strain 

To the n X n matrix of transformed squared distances Dij correspond, by 
MDS, new coordinates Xi e W\ unique up to a rotation and translation. 
Thus, a Schoenberg transformation induces a mapping or embedding x = rj{x) 
of the original coordinates into the transformed coordinates, similar to the 
high-dimensional embeddings of Machine Learning (Bavaud 2011). 

Let a denote a position in the the transformed space, and define 

i 

On one hand, 

nijnT(a) = i^/./,A,=^(/) 

a Z ^ — ^ 

in view of Section [2.1[ On the other hand, minimizing the transformed inertia 
r{a) = T{ri{a)) amounts in minimizing r(a) under the additional constraint 
a ~ rj{a) = fo^' some a. The importance of that constraint, re- 

flecting the non-linearity of the embedding 77, can be measured by the strain 
7(a) = r{a)/ A{f), obeying 7(a) > 1 by construction. 



2.4 Behavior of minima 

Two types of minima exist: distributed minima, where the identities ^ hold, 
and a is strictly positive at each observation; and concentrated minima, where 
a is concentrated on some observation iq, thus making a = Xi^. The latter 
case holds iff the quantity max^ (j)'{Dia) is infinite. 

Theorem 2. 

la) when the transformation is rectifiable, a minimizer a — '^^otiXi of 
r(a) necessarily satisfies the identities in ^ (distributed case) 
lb) the minimizer also necessarily satisfies the stability condition 
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min^ (Aa) + 2(^"(Aa)Aa cos^ 0^,] > (7) 

i 

where h is another point and the angle between the vectors x.^ — a and b—a. 
In particular, a is stable if 

J2 M^'i^^a) + 2^"(Aa)Aa] > . (8) 

i 

2) when the transformation is not rectifiable, a minimizer either behaves as 
in the distributed case 1 ), or in a concentrated way, with support concentrated 
on a single observation Xi^ . 

Proof. See the appendix for a proof of la) and lb). To prove 2), note that 
equation ^ is not justified anymore iff ip'{Dia) = oo for some i. As ^' {D) 
is always finite for D > 0, then, necessarily, Dia = and = oo (non- 

rectifiable transformation). The observations being distinct, this situation 
depicts a centroid concentrated on an unique i — iq (concentrated case). In 
particular, this regime necessarily holds whenever the distributed stability 
condition ([t]) is violated. □ 

The concentration of the minimizing profile a can be measured by its 
entropy H{a) — — aj Inoij > 0, where H(a) = iff the profile is concen- 
trated. 



3 The univariate case 



In one dimension, minimizing r{a) = fi 4>{{xi — a)^) yields 
= ^ /i 0'((a;,; - aY){xi -a) = ^fi ip{xi - a) . 

i i 

The odd function defined for _D > as 

i;{VD) ^ ^'{D)VD (9) 

is known as the "^/i-function" in the theory of Af -estimators (e.g. Huber 1964, 
Hampel et al. 1986, Maronna et al. 2006 and references therein). It is fair 
to add that the iterative scheme ^ also generalizes the VF-estimation of 
location proposed by Tukey (1977), itself identified as a particular case of the 
M-estimation (Hampel et al. 1986 p. 116). 
The sign of the derivative 



X{D) ^ ^ ^'p) + 2r{D)D (10) 
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governs the stabihty of the ^-estimate, in accordance with Theorem [2] lb), 
where conditions ([T]) and (jsj) now coincide in view of cos 0^^^ — ±1 in one 
dimension. 



3.1 Power transformation 

The power transformation (j){D) — D"^ with < g < 1 is not rectifiable nor 
bounded. It exhibits both regimes, namely, distributed solutions for q > 1/2 
and concentrated ones for q < 1/2, as demonstrated by the study of the sign 
of xiD) = q(2q - l)D'i-^ (see figure [3|. 



3.2 Rectifiable transformations 

Transformations satisfying (^'(0) < oo can be written as 

^{D) = ri{5)h{^) h{0)=0 h'{0)^l (11) 

where 5 is a (squared) characteristic length, and 77 ((5) is immaterial for our 
purpose. Such are the exponential and logarithmic transformations of Section 
|2.2[ as well as the Tukey and Huher transformations 



w(D) = { ^ ,„ . iukey transformation 

' [1/3 otherwise 

,r.^ \ D \iD<5 , „ , . ^^^> 

(p(D) = < „ /TT— ^ , . Huber transiormation. 

[2VoD — otherwise 

As a matter of fact, the ■(/'-functions associated to those transformations can 
be shown to respectively yield the so-called Tukey bisquare and the Huber 
function, familiar in the theory of M-estimates, whence their names. The 
derivatives of the Tukey and the Huber transformations follow the pattern 
of Theorem [T] except at the value D — S, presenting a discontinuity in the 
third, respectively second-order derivative. 

Theorem 3 (rectifiable transformations; proof in the appendix). 

There exists a finite characteristic length Sq granting the uniqueness of the 
minimizer a for S > 5o, with lim5_>.oo a = Xf. 
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3.3 Bounded transformations 

Bounded transformations can be written as 

ifiD) T]{S) h{^) h{0) = h{oo)^l . (13) 

d 

Such is the case of the discrete metric ip{D) — I{D > 0), attributing a unit 
dissimilarity between distinct observations. The transformed inertia reads 
r{a) = 1 — '^ifil{a. = Xi), and attains its minimum values 1 — when 
a = Xi, thus generating concentrated solutions only. 

This behavior is emblematic of bounded transformations in general (rec- 
tifiable or not), which tend towards the discrete metric in the limit (5 — > 



(Section 2.2). Hence, by continuity, n minima emerge in the limit of small 



characteristic length: 

Theorem 4 (bounded transformations). There exists a finite character- 
istic length 6q granting the existence of n minima, for 6 < Sq, each being 
located near an observation Xi. 




2 3 4 5 ^ I I ^-^^ ^-^^ ^-^^ ^-^^ 

a 1 1 a 



Fig. 1 Transformed inertia r{a) (arbitrary units) versus a. Left: exponential transform 
(/9(D) = 1 — exp(— AD) with A =0.1, 1, 10 and I'OOO, in ascending order. Right: power 
transform ¥>(D) = D' with q =0.7, 0.5, 0.3 and 0.1, in ascending order. The bottom hne 
depicts the values of 15, respectively 3 weighted observations, "x" denotes Xf, and the 
segment denotes the median interval. 
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4 Illustration: copper concentrations data 



Consider the dataset chem, available in the R library MASS (Venables and 
Ripley 2002) consisting of n = 24 copper concentrations in wholemeal flour 
(Abbey, 1988), with sorted values 

2.20 2.20 2.40 2.40 2.50 2.70 2.80 2.90 3.03 3.03 3.10 3.37 
3.40 3.40 3.40 3.50 3.60 3.70 3.70 3.70 3.70 3.77 5.28 28.95 



Ties occur twice (x = 2.20, 2.40, 3.03), three times (x = 3.40) or four 
times (x — 3.70), and must preliminarily aggregated, resulting in a sample 
of n = 16 observations with varying weights. 

Figure [I] plots the functions r{a) resulting form the exponential and the 
power transform, for varying parameters. It illustrates the gradual emergence 
of n minima from the mean Xf = 4.2804, in accordance to Section [3[ In the 
power case at the transition value q = 1/2, the minimum is attained at the 
median interval X0.5 = (3.37, 3.40) (second plot of figure [l| right). 

Figures [2] and [3] exhibit the values a resulting from the iteration of equa- 
tions (U), for various transformations and various range of parameters, start- 
ing with an initial value drawn as Oq ^ U{2, 6) (which de facto excludes the 
solutions around x = 28.95, for readability sake). 

The rectifiable transformations (Figure [2| yield distributed solutions (in 
accordance with la) and lb) of Theorem 12]), unique for large characteris- 
tic length (Theorem [3]) , and, in the case of bounded exponential and Tukey 
transformations, "quasi-concentrated" for large characteristic length (Theo- 
rem |4|. 

By contrast, the power transformation (Figure |3]) exhibits a phase transi- 
tion at g = 1/2 (the median) between distributed and concentrated regimes 



(Section 3.1 and Theorem pi). The value of the strain can be shown to con- 



verge for q H> to 2(1 - /ij/(l - /|), where fig is the weight of the 



observation on which the minimum is concentrated; it ranges from 1.82 (ties 
occurring four times) to 2.09 (no ties). 

As expected, the value of the entropy is zero in the concentrated regime. 
Figure |3] bottom right demonstrates its deceptive behavior when the aggre- 
gation of ties is neglected. Also, the entropy dramatically decreases when the 
distributed solution crosses the sample values. 
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5 The multivariate case 

The robust estimation scheme ^ depends on the nxn distance matrix only, 
and not on the coordinates nor on their dimension. The muhivariate nature 
of data is then of httle concern, in contrast to early attempts of generaliz- 
ing M-estimates (such as the "peeling" or the "iterative trimming" schemes 
cited in Maronna (1976)). Rather, the success of the multidimensional imple- 
mentation boils down to the ability to define an univocal squared Euclidean 
distance for multidimensional data. This is typically the case for the chi- 
square metric of Correspondence Analysis (CA) (Benzecri 1973, Greenacre 
1984), illustrated below. 



5.1 Illustration: scientific collaborations of India 



Consider the contingency table N = (n^g) giving the number of scientific pub- 
lications (1993-2000) co-authored by India and foreign countries i — 1, ... ,29 
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Fig. 3 Location estimates for the copper concentrations data, using the power transfor- 
mation (/'(-D) = -D', for various values of q (top left). Corresponding strain (top right) 
and entropies (bottom), exhibiting spurious, non-zero values when ties are not aggregated 
(bottom right). 



(rows), sorted by disciplines g = 1,...,22 (columns) (Anuradha and Urs 
2007). The natural distance between countries is the chi-square distance 

A,=E^.(^^-"^)' = — (14) 

where the dot symbol sums over all values of the corresponding index. The 
associated inertia A{f) is, up to the total count, the chi-square measure of 
rows-columns dependence; the determination of its low-dimensional projec- 
tion defines the well-known Correspondence Analysis procedure. 

CA extracts uncorrelated factorial coordinates such that D^j = 
S^j>o(^*;3 sach dimension /3 accounting for a proportion of explained 

inertia, decreasing with /3. Figure |4] left exhibits the countries coordinates in 
the first and second dimensions. The size of the circles depicts the countries 
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weights, maximum for the the USA, whose coordinates are also the closest 
to the origin. 




Fig. 4 Left: low-dimensional factorial coordinates of the countries, expressing 60% of the 
inertia. Right: zoom of the rectangle in the left figure, depicting the positions of the phi- 
estimate of location, under power (X), exponential (0) and logarithmic (-) transformations, 
with parameters ranging in the distributed regime. 



Figure |4] right depicts the plane trajectories, under various transformations 
in the distributed regime, of the centroid projections '^^aiXip for (3 — 1,2, 
where a is the profile resulting from ([2|. All trajectories, diverse in shape and 
small in amplitude, lead from the the mean at the origin Xfp= fiXip = 
(circle) to the coordinates of the USA (triangle). 



6 Conclusion 



Classical multidimensional scaling extracts the coordinates of the objects, 
uniformly weighted or not, from their Euclidean inter-distances only. In the 
same circumstances, this paper proposes a class of robust location estimates, 
based upon a Schoenberg transformation of distances, and exhibiting a wide- 
ranging behavior. Further studies on their sensitivity, breakdown points and 
the like should presumably help addressing the question "which transforma- 
tion should be used in which context" , left aside in the present set-up. 

Additional guidance is also expected from a probabilistic point of view, 
deliberately avoided as well in this paper. Manifestly, Schoneberg transforma- 
tions can serve in generating new densities: typically, replacing the distances 
in an univariate normal distribution by their square root produces an expo- 
nential distribution, which constitutes, in the spirit of Section 2.3 a normal 
distribution in some higher-dimensional embedded space. In that context. 
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maximum likelihood considerations should also help in selecting a suitable 
family of transformations - whose Euclidean nature constitutes a favorable 
circumstance, regarding their tractabillity. 
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7 Appendix 

Proof (of Theorem^. The derivative of a rectifiable transform is bounded. 
Setting to zero the derivative of r{a) with respect to the ^-th component of 
the centroid a; — J2i oti^a yields 

-2Y,h'P'{D,a)[xu-ai)=Q 



which is the first identity in (12 
Huygens principle of section |2.1 



, the second one resulting from the strong 
This solution is a minimum if, for all t 
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kl ^'^ i 



Setting t ^ b — a yields {{xi — a)'tY — Dia\\t\\ cos 0°^, proving ([7|). Equation 
^ follows form </>"(!)) < and cos^ 91^^ < 1. □ 

An alternative, less direct proof also obtains by considering perturbing 
the profile elements rather than the centroid components a/. The second 
identity in ^ yields 

Dij - ^ akDjk = Aj - - ^{a) 

and hence the minimum condition 

= = /^^'(Aa)(A, - D,a - A{a)) 

i 

which, by comparison to the identity J2i (^ii^ij ~ Dja — A{ct)) = 0, yields 
the first identity in ^ . Further derivating with respect to ak yields together 
with ([2]) the stability condition 



E da da/ ''"' " " E f^^'^D,a) DjkZjZk 

jk ^ i jk 



^/,^"(Aa)E(A, - Daj - Aia))z,]' > (15) 



where z is an admissible infinitesimal variation of the profile, that is of the 
form z — P — a where /3 is another profile. But X^jfc ^jk^jZk ~ — 2Afc (e.g. 
Bavaud 2011) and 

^(Aj - Daj - A{a))Zj = Ab - Dab - D,a = -2^DabD,a COS 9fi, 
3 

by the cosine formula. □ 

Proof (of Theorem^. By hypothesis, < (/?'(0) < 00. From ([s]), 

1"°° 1 /■ 1 

\D(j)"{D)\ = / ADexp(-A£').g(A) dX < - g{X) dX = -(p'{0) 
Jo ' e J e 

implying limo^Q D(t)" {D\S) = by dominated convergence. Hence x(0) > 0, 
and, by continuity x{D) > for D small enough, or equivalently for 6 large 
enough. As a consequence, the stability condition ([S]) of Theorem ([2| holds 
irrespectively of the value of a, that is r{a) is convex for d large enough, and 
thus possesses an unique minimum. The latter tends to Xf in the limit 5 — > 0, 
in view of ip{D) = '-^{D + 0{D^/S^)). □ 



